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The spectrum and scale of fluctuations in protein structures affect the range of cell phenomena, 
including stability of protein structures or their fragments, allosteric transitions and energy trans- 
fer. The study presents a statistical-thermodynamic analysis of relationship between the sequence 
composition and the distribution of residue fluctuations in protein-protein complexes. A one-node- 
per-residue elastic network model accounting for the nonhomogeneous protein mass distribution and 
the inter-atomic interactions through the renormalized inter-residue potential is developed. Two fac- 
tors, a protein mass distribution and a residue environment, were found to determine the scale of 
residue fluctuations. Surface residues undergo larger fluctuations than core residues, showing agree- 
ment with experimental observations. Ranking residues over the normalized scale of fluctuations 
yields a distinct classification of amino acids into three groups: (i) highly fluctuating - Gly, Ala, Ser, 
Pro and Asp, (ii) moderately fluctuating - Thr, Asn, Gin, Lys, Glu, Arg, Val and Cys (iii) weakly 
fluctuating - He, Leu, Met, Phe, Tyr, Trp and His. The structural instability in proteins possibly 
relates to the high content of the highly fluctuating residues and a deficiency of the weakly fluctuat- 
ing residues in irregular secondary structure elements (loops), chameleon sequences and disordered 
proteins. Strong correlation between residue fluctuations and the sequence composition of protein 
loops supports this hypothesis. Comparing fluctuations of binding site residues (interface residues) 
with other surface residues shows that, on average, the interface is more rigid than the rest of the 
protein surface and Gly, Ala, Ser, Cys, Leu and Trp have a propensity to form more stable docking 
patches on the interface. The findings have broad implications for understanding mechanisms of 
protein association and stability of protein structures. 
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I. INTRODUCTION 



A remarkable difference between sequence composi- 
tions of regular and irregular secondary structure ele- 
ments of proteins has been attracting considerable atten- 
tion for more than 30 years [l|, H, H, EL H, @ . Amino-acid 
composition profiles revealed that the irregular regions 
(protein loops) are enriched in Gly, Pro, Ser and Asp. 
The regular regions (a — helices and (3 — strands) con- 
tain less of these amino acids. Helices are enriched in 
Leu, Ala, Glu and Gin, and (3 — strands are enriched in 
Val, lie, Phe and Tyr. Amino-acid compositions of pro- 
tein interfaces has been analyzed @, H, H, E3] ■ Despite 
the extensive use of the statistics in almost all aspects 
of protein modeling (e.g. in computational alg orithms 
for the secondary structure assignments (see [LL| for the 
review); in knowledge-based approaches to predict pro- 
tein structur es [l^[l3l llll [TBL Ho . \vf\ , in receptor- ligand 
docking [ll, 1 1 9L l2fjj|) the understanding of mechanisms 
that form amino acids propensities is still incomplete 
and poses a challenge for researchers in physics and biol- 
ogy. Recent discoveries of chameleon sequences, that un- 



dergo helix-sheet transitions [21|, |22j, |23j, |2J, |25|, |26J] , and 
intrinsically disordered proteins or frag ments, that un- 
dergo order-disorder transitions [27j, [2c% |29| , have added 
interest to the problem. Studying the distribution, the 
scale and features of structural and thermal fluctuations 
in proteins is one way to tackle this puzzle. 

Protein functionality, encoded into the sequence, is 
based on a dual ability of proteins to sustain and change 
their structures [30j. The relationship has different de- 
grees of sensitivity to the location and the scale of 
changes of protein structures (e.g. CH 3 group rota- 
tions, conversions of side-chain rotamers, cis-trans iso- 
merization of proline or domain shifts). Last ten years 
demonstrated increasing popularity of low-resolution or 
coarse-grained models in conjunction with harmonic po- 
tentials, called elastic network models (ENM), for de- 
ciphering and modeling various large-scale structural 
changes (e.g., allosteric changes in protein structures 
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34 , structural changes on transition pathways 
371 [H, HH, El] , global conformational changes 
upon protein-protein binding jSJ, El, El] ) . Other appli- 
cations of these models include the analysis of Debye- 
Waller factors of C a atoms [H Q El, El, E3, El, \m 
and protein docking [H, HI] . 

Two types of ENMs are widely used: homogeneous 
and nonhomogeneous models. A homogeneous ENM is 
a network of nodes represented by C a atoms and con- 
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nected by Hooke springs if the distance between nodes 
is less than a cutoff radius [M GH H IE HE EL 
EE 0, All network nodes are assigned an equal 

mass that smoothes protein mass density. The homo- 
geneous ENM has two parameters only, the cutoff ra- 
dius and the spring force constant. Nonhomogeneous 
ENMs introduce structural and interaction inhomogene- 
ity by assigning residue masses to the network nodes 
represented by C a atoms [48| or by assigning distance- 
or residue type -dep endent force constants to interacting 
nodes [H |M M, EE EE EE HI . The effect of pro- 
tein sequence variations on the spring force constants 
has been considered recently [33] . Double- well ENMs are 
used to model large-scale conformational transition path- 
ways [37], HE EE [EH . Merging residues into rigid blocks 
is used to consider properties of larg e macromolecules 
within ENM of a lower resolution [la EE EE Less 
"extreme" coarse graining keeps three translational de- 
grees of freedom of C a — based nodes and degrees of free- 
dom of bond angles and dihedrals (see [5lll53l l54j ) . 

In the context of nonhomogeneous ENMs, we present 
a novel method to account for the protein mass distribu- 
tion and inter-atomic contacts within the coarse-grained 
model. We move network nodes from C a atoms to the 
centers of mass of protein residues to bring in the ef- 
fects of side chains into the model. We derive a modi- 
fied Tirion-like potential [55[ to bring in structural de- 
tails of the atomic level and put forward a statistical- 
thcrmodynamic formalism to calculate residue fluctua- 
tions of a set of protein complexes (56|. We show that 
the scale of residue fluctuations increases from the in- 
side to the protein surface, showing agreement with the 
Frauenfelder-Petsko-Tsernoglou model [6^]. We suggest 
a classification of protein residues based on the nor- 
malized scale of fluctuations and discuss how the scale 
of fluctuations correlates with amino acid propensities 
in secondary structure elements, chameleon sequences 
and disordered fragments. Fluctuations of binding site 
residues (interface residues) are compared with other sur- 
face residues. The tendency of some residues to form 
more stable docking patches on the interface is discussed 
as well as the role of loops at early stages of protein ther- 
mal denaturation. 



II. MODEL 

A modified nonhomogeneous ENM is used in calcula- 
tions. Network nodes are placed in the centers of mass of 
protein residues and residue masses are assigned to the 
corresponding network nodes. The following is a descrip- 
tion of a formalism to consistently transform the inter- 
atomic protein energy landscape into the inter-residue 
landscape. As a result, we obtain a modified inter-residue 
harmonic potential with a spring force constant propor- 
tional to the number of inter-atomic contacts between 
residues (see Eq ([3]) below). 

The interaction energy between protein residues i and 



k is 

U ik (Ri -R k ) = J2 U a p(Ri + K - Rk - u%), (1) 

a,P 

where Ri )k are radius vectors of the centers of mass of 
residues i and k, a are the radius vectors of atoms 
a and [3 relative to the centers of mass of the residues 
i and k accordingly. The sum in Eq. (fT]) runs over all 
pairs of atoms separated by a distance less then the in- 
teraction cutoff. We use the cutoff of 14A that assures 
a tolerable level of the cutoff-related ruggedness of the 
energy landscape [IE HI] • Introducing a residue-residue 
potential, one can rewrite Eq. (TT|) as Uik{Ri — Rk) = 
NikV(Ri — Rk), where N k is a number of inter-atomic 
interactions between residues i and k and V is the coarse- 
grained or inter-residue potential per se. Assuming that 
inter-residue interactions are in equilibrium in a native 
protein and using a Lennard- Jones form of the inter- 
residue potential, we can expand V{Ri — Rk) in Taylor 
series of deviations Rik — R® k °f the inter-residue distance 
Rk — \Ri — Rk\ from its equilibrium R9j. Expanding to 
the second order in Rj — i?? yields 

U ik {Ri -R k ) = -N lk e + 36A tfc£ ^ R0 — ) , (2) 

where e is the depth of the Lennard- Jones potential. 
Eq. [2] shows that inter-residue interactions are propor- 
tional to the number of inter-atomic interactions and de- 
crease with the increase of the inter-residue distance as 

Since Ri ;k — R® k + r i,k, we obtain 

N ( R° \ 2 

U ik ( ri -r k ,R ik ) = -eN tk +36e—^ f*( ri _ rfc )J , 

where ri >k are the deviations of the residue centers of 
mass from its equilibrium position. The main difference 
between Eq © and Tirion-like potentials [55[ used in 
nonhomogeneous ENMs is the factor Nik which intro- 
duces the distribution of inter-atomic interactions into 
the coarse-grained model. In other words, the change 
of the protein model resolution from the atomic to the 
residue level results in the appearance of this factor in 
the inter-residue potential. 
The protein Lagrangian 

JV 

£= E t (n) 2 -^(n-r fc) i4) (4) 

i,k=l 

derives the following 37V equations of motions 

N 

miTj = - C ik (oL ik (ri - r k )) a ik , (5) 

k=l 

where rrii is the mass of the residue i, a.ik = R° k /R® k , 
Cik — 72sNik/(R® k ) 2 and N is the number of protein 
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residues. As usual, we seek an oscillatory solution of 
the form r& = Aj. exp(iwi), where A k are some am- 
plitude factors to be determined. The substitution of 
the trial solution into the equations of motions leads 
to the eigenvalue problem (H — uj 2 I)A = 0, where 
A = {Af,A\, Af , A|, Al, . . . } is a 3N column vector of 
the amplitude factors, 7 is a 3N x 37V unit matrix, H is 
a 37V x 3N matrix composed of 3 x 3 super elements 
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,H u = -Y,H ik , (6) 



where hfj^ = —Cika^a^/rrii and the upper indexes a, b 
stand for x, y, z projections of the vector ctik- 

The prime in sums over k in Eqs. ([5]) means that a 
term i = k is not accounted for. We use our program AH 
(Analyzer of Harmonics) to find protein eigenfrequencies 
{u>} and normalized eigenvectors. The kth oscillation can 
be written in the form 
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Xk = X! G fciCiexp(w. ( <) = ^ G ki<S>i 



(7) 



It is important to note that the residue fluctuation, de- 
rived in Eq ©, shows nonlocal dependence on the mass 
distribution in a protein. This effect totally disappears 
in the framework of a homogeneous ENM. 

Removing the effect of the parameter e on residue fluc- 
tuations, we introduce a mobility ratio (MR) of the kth 
residue in the form 



(10) 



JY 



where r 2 av = < r l > /N is the averaged mean-square 

k=l 

fluctuation in a protein. 

We computed the mobility ratios for each of the pro- 
tein residues in 184 proteins from the 92 non-obligate 
protein-protein complexes selected from a docking bench- 
mark set (5(| . For each of the proteins MRs were grouped 
in twenty groups according to names of standard amino 
acids and twenty average MRs were computed. The ob- 
tained values were averaged over the set of 184 protein 
structures. Figures 1-3 show mean MRs and standard 
deviations of the mean. 



i=l 



where 0j = Re[ci exp(cjjt)] is the the so called normal 
coordinate, Re stands for "real part of," Cj is a constant 
determined by initial conditions, columns of the matrix 
G are the normalized eigenvectors. The normal modes 
are described by 



3N~6 



H = 



(8) 



where M, 



3JV-6 

^2 mkG^ is the effective mass of the 

ith normal mode [5§|. Note that for a homogeneous 
ENM, rrii is a constant equal to some parameter m and, 
therefore, all modes will have equal effective masses: 

3JV-6 

Mi = J2 mG lr = m - 
k=l 

The mean-square fluctuation of the fcth residue along 
the coordinate axis x is < x\ >— ^2Gk m iGk x j < 

QiQj >, where the angular brackets denote a Boltz- 
mann average with the Hamiltonian ((5J) over the nor- 
mal modes, k x ^ VyZ are the numbers of degrees of freedom 
associated with the residue center of mass oscillations 
along the coordinate axes x,y,z. Boltzmann averaging 
of pair products < <di<dj > of normal coordinates yields 
< >= SijTks/iMiCjf), where T is the tempera- 

ture, ks is the Boltzmann constant and #y is the Kro- 
necker delta (Sij = 1 if i = j and <5y = if i j). The 
total mean-square fluctuation of the fcth residue has the 
form 



< rl >= Tk f 



3N~6 q2 



4=1 



(9) 



III. RESULTS 

The results show that large equilibrium fluctuations 
(1Z > 1) of protein structures are associated with the os- 
cillations of the center of mass of Gly, Ala, Ser, Pro and 
Asp (Group I) which are the most lightweight residues 
with the exception of Asp (Fig. [1]) . Modest fluctuations 
(1Z = 0.7 -7- 1.0; Group II) are associated with six po- 
lar residues (Thr, Asn, Gin, Lys, Glu, Arg) and two 
nonpolar residues (Val, Cys). The small fluctuations 
(1Z = 0.3-^0.7; Group III) are associated with six nonpo- 
lar residues (He, Leu, Met, Phe, Trp) and polar residues 
His and Tyr. It is interesting to note that, with regards 
to hydrophilicity, groups I, II and III can be characterized 
as mixed, mostly polar and mostly nonpolar. 

Analysis of the scale of fluctuations of surface and 
core residues shows that on average all surface residues 
demonstrate larger fluctuations than core residues 
(Fig. [2]). Surface (core) residues are defined here as 
those residues which have relative solvent accessible sur- 
face area higher(lower) than 25% and are identified using 
NACCESS lH. The difference is readily explained by 
the difference in numbers of nearest neighbors of surface 
and core residues (the environment effect). In compari- 
son with core residues, surface residues have less nearest 
neighbors [61]. Therefore, they are less restricted and 
experience larger fluctuations. First reports of this effect 
go back to crystallographic studies of myoglobin [62|, [63[ 
and lysozyme Q ■ It has been showed that atomic mean- 
square displacements increase from the protein core to 
the protein surface. Frauenfelder et al [g2] suggested 
that, in general, proteins have a condensed core and a 
semi-liquid surface. 
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FIG. 1: The mobility ratios of protein residues arranged in 
the order of increasing mass. 



The same environment effect appears as a small root 
mean-squared deviation between bound and unbound 
states of pocket side chains [65[ or as a decreased number 
of rotamers allowable for buried amino acids in compar- 
ison with surface amino acids [66|, [f|2j • This also clears 
up a seemingly striking difference in hydrophilicity found 
between residues of groups II and III. Indeed, amino 
acid residues are distributed nonhomogeneously in pro- 
teins. Polar residues prefer surface positions, but unpo- 
lar residues are more often found in a protein core. That 
is why the mostly polar group II demonstrates higher 
mobility ratios than the mostly unpolar group III. On 
the other side, high mobility ratios of nonpolar residues 
Gly and Ala suggest that the environment effect is not 
the only factor. The amplitude of fluctuations is in- 
versely proportional to the effective amino acid masses 
(see Eq. |9]). As a result, the largest fluctuations are ac- 
cosiated with Gly and Ala, the most lightweight residues, 
but the smallest fluctuations are accosiated with Tyr and 
Trp, the most heavy residues (Fig. [I]). 

Comparing fluctuations of binding site (interface) 
residues with other surface residues, we found that al- 
though, on average, interface is less mobile than the rest 
of the protein surface (Fig. [3]) , the noticeable difference 

( U sur _ W nt > q n jit,sur . g ^ mobmty ratio of th{J 

interface or other surface residue j) relates to Gly, Ala, 
Ser, Cys, Leu and Trp. Four of these residues (Gly, Ala, 
Leu and Ser) are the most common residues at protein 
interfaces, and residues Cys and Trp are the most infre- 
quent interface residues 0, [1] . The most conserved inter- 
face residue Trp also is the most stable one (see Fig. [3]). 
Two other highly conserved interface residues (Met and 
Phe) Q demonstrate decreased mobility in binding cites 
to a lesser extent. Note that the difference between bind- 



FIG. 2: The mobility ratios of surface and core protein 
residues. Surface (core) residues are defined here as those 
residues which have relative solvent accessible surface area 
higher(lower) than 25%. 



ing cites and the rest of the protein surface relates mainly 
to fluctuations of the nonpolar residues with the excep- 
tion of Ser, a polar residue. These results are in agree- 
ment with the experimental observation of reduced fluc- 
tuations in binding sites of myoglobin [f3§] and bacteri- 
orhodopsin [68| in comparison with fluctuations of the 
rest of macromolecules. Frauenf elder and McMahon [69[ 
also noted that the four (Leu29, Phe43, Val68 and Ilel07) 
of the six residues with reduced fluctuations surround- 
ing the oxygen molecule are nonpolar. The two other 
residues are His64 and His93 {R,ff[ s -TVgl = 0.16). The 
solvent-mediated attraction between nonpolar residues of 
a receptor and a ligand results in the hydrophobic con- 
tribution to binding free energy, which is considered to 
be one of major factors stabilizing protein-protein com- 
plexes [ZQl, El S Iz3- We suppose that Gly, Ala, Ser, 
Cys, Leu and Trp form low-mobility surface "pads" that 
constitute a "landing ground" for binding proteins. 

The larger ability to fluctuate of Group I residues pro- 
vides an insight into the inability of sequences abundant 
in Gly, Ala, Ser, Pro and Asp to fold into regular protein 
secondary structure elements (a — helices or (3 — strands). 
High mobility prevents the formation of long-range order 
thus contributing to irregular protein secondary struc- 
ture elements (loops). We computed the correlation co- 
efficient between the mobility ratios and corresponding 
percentages of amino acid residues in the bank of loops 
[a] (see Fig. (|4")l). The analysis showed significant relation 
with 0.9 correlation coefficient. 

We suggest that the same reasoning explains features 
of amino acid distributions observed in chameleon se- 
quences [H [H, [H [H and disordered proteins (HHl]. 
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FIG. 3: The mobility ratios of interface and non-interface 
surface protein residues. 



Indeed, highly and moderately fluctuating amino acid 
residues (in particular, Gly, Ala, Ser, Glu and Lys) are 
abundant in disordered and "dual personality" protein 
fragments, whereas the residues with the low mobility 
ratio (e.g. Tyr, Trp, Phe, He) are rarely found there 

mug. 

Statistics of protein residues in chameleon sequences 
shows that Ala, He, Leu and Val are the most fre- 
quent residues in chameleon sequences [H, |2f|. Since 
only Ala belongs to the Group I of highly fluctuating 
residues (Fig. []}, we can hypothesize that an instabil- 
ity driving helix <-» sheet transitions may often originate 
at Ala residues if the other highly fluctuating residues 
are absent. Frequencies of occurrence of Gly and Ser 
residues increase with the increase of the length of the 
sequence p^ |. Thus, in general, chameleon sequences 
may have several islands of instability. Exciting these 
islands locally (e.g., by mutations that change interac- 
tions of the islands with the rest of the protein or by lig- 
ands bound in the vicinity of the chameleon sequence), 
one could trigger a helix <-> sheet transition. Muta- 
tions of a chameleon sequence, that change the mobility 
ratio of a sequence position significantly, can also pro- 
voke such transitions. It has been reported that a single 
mutation from Pro to Ala (Kaio. — Kp ro — 0.4) con- 
verts a j3 — sheet into an a — helix [23|. Mutations of 
two consecutive residues from Phe28Phe29 to Pro28Ile29 
(Upro - Uphe = 0.7, Hue - Kp he = 0.2) converts an 
a — helix into a f3 — sheet [25j. 

The mobility ratio derived by Eq ^ increases with 
the temperature increase. Therefore, we could expect 
that at the very early stages of protein thermal denat- 
uration amino acid residues of the enhanced ability to 
fluctuate (Group I) and their structural neighbors will 



FIG. 4: The mobility ratios of protein residues against their 
percentage compositions in protein loops |5(- 

form first seeds of the unfolded phase. Since the ma- 
jority of Group I amino acids (Gly, Ser, Pro and Asp) 
shows higher propensities for loops than for helices or 
sheets [6], it is possible that the nucleation of the un- 
folded phase starts on protein loops. Due to the in- 
creased ability to fluctuate, Group I residues can be also 
involved more often than other residues into equilibrium 
local folding-unfolding reactions scattered over the pro- 
tein surface [tJ [zl . 

IV. CONCLUSIONS 

The current work focuses on the fundamental relation- 
ship between the protein sequence, ability to fluctuate 
and functionality of protein structures. We have consid- 
ered the relationship within a framework of a novel elastic 
network model that allows accounting for the distribution 
of inter-atomic interactions within a coarse-grained ap- 
proach. The model modifies a commonly used form of 
the Tirion potential with a spring constant proportional 
to the number of inter- atomic contacts between residues. 
We demonstrated that two factors, a protein mass dis- 
tribution and a residue environment, determine the scale 
of fluctuations. Surface residues undergo larger fluctua- 
tions than core residues in agreement with experimental 
observations l63l. |64|. On average, the protein inter- 
face is less mobile than the rest of the protein surface 
and contains low-mobility pads associated mainly with 
nonpolar residues. We hypothesize that the conforma- 
tional instability of protein loops, chameleon sequences 
and disordered proteins relates to the high content of 
highly mobile residues and the lack of weakly fluctuating 
residues. The results show high correlation between fluc- 
tuations and the sequence composition of protein loops. 
Analysis of residue fluctuations and their propensities in 
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secondary structure elements allows one to conclude that 
upon thermal dcnaturation the nucleation of the unfolded 
phase proceeds from protein loops. The results provide 
insight into structural fluctuations of proteins and facil- 
itate better understanding of protein association mecha- 
nisms. 
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